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Randomization-based  Inferences  about  Latent 


Variables  from  Complex  Samples 

Abstract 

Standard  procedures  for  drawing  inferences  from  complex 
samples  do  not  apply  when  the  variable  of  interest  6  cannot  be 
observed  directly,  but  must  be  inferred  from  the  values  of 
secondary  random  variables  that  depend  on  9  stochastically. 
Examples  are  examinee  proficiency  variables  in  item  response 
theory  models  and  class  memberships  in  latent  class  models.  This 
paper  uses  Rubin's  "multiple  imputation"  approach  to  approximate 
sample  statistics  that  would  have  been  obtained,  had  9  been 
observable.  Associated  variance  estimates  account  for  uncertainty 
due  to  both  the  sampling  of  respondents  from  the  population  and 
the  latency  of  9.  The  approach  is  illustrated  with  artificial 
examples  and  with  data  from  the  1984  National  Assessment  for 
Educational  Progress  reading  survey. 


Key  words:  Complex  samples,  item  response  theory,  latent 

structure,  missing  data,  multiple  imputation, 
National  Assessment  of  Educational  Progress,  sample 


surveys . 
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Introduction 

Latent-variable  models  are  used  in  the  social  sciences  to 
provide  parsimonious  explanations  of  associations  among  observed 
variables  in  terms  of  theoretical  constructs.  Practical  benefits 
can  accrue  as  well,  as  when  examinees  who  have  been  presented 
different  test  items  are  compared  using  item  response  theory  (IRT) 
psychometric  models,  or  when  consumer  satisfaction  levels  are 
tracked  over  time  with  different  survey  questions  by  means  of  a 
latent  class  model. 

This  paper  addresses  the  problem  of  estimating  the 
distributions  of  latent  variables  in  finite  populations,  when  the 
data  are  obtained  in  complex  sampling  designs.  The  solution  it 
offers  is  to  apply  Rubin's  (1987)  "multiple  imputations" 
procedures  for  missing  data  in  sample  surveys.  This  approach 
provides  consistent  estimates  of  population  characteristics, 
supports  statements  of  precision  that  account  for  both  the 
sampling  of  subjects  and  the  latency  of  variables,  and  produces 
filled-in  pseudo  datasets  that  are  easy  for  secondary  researchers 
to  analyze  correctlv. 

The  following  sections  briefly  review  randomization-based 
inference  about  manifest  variables  from  complex  samples,  and 
multiple  imputation  for  nonresponse.  The  next  sections  apply 
these  ideas  to  latent- variable  measurement  models  and  illustrate 
them  with  an  example  using  classical  test  theory.  Computing 
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approximations  are  then  discussed.  The  paper  concludes  by 
sketching  the  implementation,  the  results,  and  the  lessons  learned 
in  applying  the  procedures  to  the  1984  reading  survey  of  the 
National  Assessment  of  Educational  Progress  (NAEP) . 

Drawing  Inferences  from  Complex  Samples 
Standard  analyses  of  sample  survey  data  (e.g.,  Cochran,  1977) 
provide  estimates  of  population  characteristics  and  associated 
sampling  variances  when  the  values  of  survey  variables  are 
observed  from  each  respondent  in  the  realized  sample.  This 
section  gives  background  and  notation  for  the  analysis  of  survey 
data,  first  when  all  responses  are  in  fact  observed,  then  when 
some  are  missing. 

Randomization- Based  Inference  with  Complete  Data 

Consider  a  population  of  N  identifiable  units,  indexed  by  the 
subscript  i.  Associated  with  each  unit  are  two  possibly  vector¬ 
valued  variables  Y  and  Z.  The  values  of  the  design  variables,  Z, 
are  known  for  all  units  before  observations  are  made,  but  the 
values  of  the  survey  variables,  Y,  are  not.  Let  (X,Z)  denote  the 
population  matrix  of  values.  Interest  lies  in  the  value  of  a 
function  S  -  S(Y,Z)  of  the  population  values;  examples  include  a 
population  total  for  an  element  of  Y,  a  subpopulation  mean  for  an 
element  of  Y  given  specified  values  of  elements  of  Z,  and  the 
linear  regression  coefficients  of  some  elements  of  Y  on  others. 


J 


( 


< 
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Any  of  these  functions  could  be  calculated  directly  if  values  of  Y 

and  Z  were  observed  for  all  units  in  the  population. 

N 

There  are  2  possible  subsets  of  units,  and  a  sample  design 
assigns  a  probability  to  each.  A  simple  random  sample  of  size 
100,  for  example,  assigns  equal  probability  to  all  subsets  of  100 
units,  and  zero  to  all  other  subsets.  A  sample  design  yields  a 
"complex  sample"  when  it  exhibits  one  or  more  of  the  following 
features:  unequal  probabilities  of  selection  for  different  units; 
stratification,  which  ensures  prespecified  rates  of  representation 
in  the  sample  according  to  values  of  Z;  or  clustering,  which  uses 
values  of  Z  to  link  selection  probabilities  of  units  when  their 
joint  occurrence  facilitates  gathering  the  data  (e.g.,  it  is 
easier  to  interview  two  respondents  in  the  same  town  than  two  in 
different  towns) . 

Randomization-based  inference  about  S  is  based  on  the 
distribution  of  a  statistic  s  -  s(y)  calculated  using  y,  the  Y 
values  of  a  subset  of  units  selected  in  accordance  with  a 
prespecified  sample  design.  In  practical  work,  sample  designs  are 
usually  constructed  so  as  to  yield  a  nearly  unbiased  statistic  s 
and  an  estimate  of  its  variance  in  the  form  of  another  statistic  U 


-  U(y). 


Inferences  are  then  drawn  using  the  normal  approximation 


* 


(s  -  S)  -  N(0 , U) 
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Under  the  randomization  approach  to  inference  in  sample 
surveys,  population  values  Y  are  taken  as  fixed  unknown 
quantities,  and  the  notion  of  randomness  enters  only  in  the 
selection  of  a  sample  in  accordance  with  the  sample  design.  Under 
the  model-based  approach,  in  contrast,  Y  is  viewed  as  a  realized 
sample  from  a  hypothetical  "superpopulation"  under  which  variables 
are  distributed  in  accordance  with  some  model  p(y|z)  (Cassel, 
Sarndal ,  and  Wretman,  1977).  The  randomization  approach  dominates 
current  practice,  and  will  be  used  in  the  sequel  to  deal  with 
uncertainty  due  to  sampling  subjects.  It  will  be  seen  that 
superpopulation  concepts  are  required  nonetheless  to  handle 
missingness  and  latency. 

Randomization-Based  Inference  with  Incomplete  Data 

In  practice,  values  of  one  or  more  survey  variables  will  not 
be  observed  from  some  subjects  in  the  realized  sample,  for 
reasons  that  may  or  may  not  be  related  to  the  values  that  would 
have  been  observed.  For  any  respondent,  let  the  partitioning 
Y“(y  •  , y  ,  )  indicate  the  elements  of  the  survey  variables  that 

were  missing  and  observed.  Much  progress  has  been  made  extending 
inferential  procedures  to  survey  data  with  missing  responses  when 
those  responses  are  missing  at  random  (MAR);  that  is,  the 
probability  that  the  elements  in  y  ate  missing  may  depend  on 
the  values  of  yQ^s  and  z,  but  beyond  that,  not  on  the  value  of 

y  . 
mis 


(Little  and  Rubin,  1987;  Rubin,  1976). 
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Let  (lmis<l0\ys^  ^e  t^ie  matrices  of  missing  and  observed 

survey  variables  in  a  realized  sample.  The  values  of  ^Q^s" "which 

may  comprise  different  elements  of  y  for  different  subjects- -are 

now  known,  but  the  values  of  y  .  are  not.  It  is  not  possible  to 

^mis 

calculate  s  directly,  but  it  may  be  possible  to  calculate  its 
conditional  expectation: 

Ets(^IZobsJ  -  S  s<XmiS.W  p(WZobs-5>  dxrais  •  (D 

The  predictive  density  P(^m£s l^obs >z)  expresses  the  extent  of 
knowledge  about  what  the  missing  responses  might  have  been,  given 
the  observed  responses  and  the  survey  variables.  If  MAR  holds, 
the  predictive  distribution  of  a  missing  element  for  subject  i-- 
say  y ^  --is  approximated  by  the  responses  to  that  element  from 
subjects  who  did  respond  to  it  and  have  the  same  z  values  as 
subject  i  and  the  same  responses  on  the  elements  in  y^  0^s •  In 
large  surveys  with  few  missing  values,  one  can  use  these  empirical 
distributions  directly.  The  Census  Bureau's  "hot  deck"  imputation 
procedure  (see  Ford,  1983),  for  example,  assumes  that  predictive 
densities  are  independent  over  respondents- - i . e . , 


p(W*obs'5>  “  n  P^i.mis'yi.obs'V 


and  calculates  s  with  each  person's  observed  responses  and,  for 
those  he  or  she  is  missing,  draws  from  the  actual  responses  of 
suitably  similar  respondents. 
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Alternatively,  one  can  posit  a  functional  form  for 
P^ymis  ^obs  ’ ’  suc^  as  a  regressi°n  model  with  parameters  0.  It 
is  usually  reasonable  to  assume  independence  over  respondents,  but 
now  conditional  on  the  unknown  parameters  0;  that  is, 


P^mis'W*’^  “  "  P(yi, 


.  |y.  ,  ,z. ,0) 

mis  J  l , obs  l 


(2) 


In  this  case,  the  appropriate  predictive  distribution  is  obtained 
after  marginalizing  with  respect  to  0: 


1  df>  '  <3) 

where  P^l^obs^  Poster^-or  distribution  of  0  after  yo^s  has 

been  observed . 

Multiple  Imputation  for  Missing  Responses 
As  with  the  hot  deck,  one  can  obtain  a  rough  numerical 
approximation  of  (l)--an  unbiased  estimate  of  the  expectation  of 
s--by  filling  in  each  missing  response  with  a  random  draw  from  its 
predictive  density,  then  calculating  s  as  if  these  imputations 
were  the  true  response  values.  An  analogous  approximation  of  U 
using  these  single  imputations  underestimates  the  variability  of 
the  resulting  estimate  of  S,  however.  It  accounts  for  uncertainty 
due  to  sampling  subjects,  but  not  uncertainty  due  to  imputing 
values  for  missing  responses.  To  remedy  this  deficiency,  Rubin 
suggests  multiple  imputations.  When  the  statistic  s  is  scalar  and 
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the  predictive  distribution  for  y  .  is  model-based,  one  proceeds 
r  ^mis 

as  follows: 


1.  Determine  the  posterior  distribution  P()3|^  ^s). 


Produce  M  distinct  "completed"  datasets  y.  .  ,  m=l,...,M. 

~(m) 


Each  looks  like  a  complete  dataset;  it  has  the  values  ^  , 
for  the  responses  that  were  observed,  and,  for  those  that 
were  not,  a  draw  from  (3).  Two  steps  complete  jr  : 


a.  Draw  a  value  0  from 


b.  For  each  respondent  with  one  or  more  missing  responses, 


draw  a  value  from  p(y.  .  |y.  ,  ,z.,0=0,  .).  Taken 

r  J l ,mis l , obs  l  (m) 


together,  the  resulting  imputation  for  and  the 

observed  value  of  y.  ,  constitute  y . „  . ,  the 
J l , obs  J  l (m) 

"completed"  response  of  subject  i. 


3.  Using  each  completed  dataset,  calculate  s =s (y ^ )  and 


Vru(W- 


4.  The  final  estimate  of  S  is  the  average  of  the  M  estimates 
from  the  completed  datasets: 


M 


SM  "  V(m)  7  M  ' 
m=>l 


(4) 


5.  The  estimated  sampling  variance  of  s^  as  an  estimate  of  S, 


1 


say  V  ,  is  the  sum  of  two  components: 
M 
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VM  "  UM  +  (1+M  >  BM 


where 


UM  -  2  U.  ,  /  M 

M  m  i  <m) 

m=l 


quantifies  uncertainty  due  to  sampling  subjects,  and 


BM  "  YS(n»-  V  /  ^  ’ 

m=i 

the  variance  among  the  estimates  of  s  from  the  M  completed 
datasets,  quantifies  uncertainty  due  to  missing  responses 
from  the  sampled  subjects. 


6.  If  inferences  about  S  would  have  been  based  on  (s-S)  -  N(0,U) 
had  all  responses  been  observed,  inferences  are  now  based  on 


(s-S)  -  t  (0,V  )  , 
M  v  M 


a  t-distribution,  with  degrees  of  freedom  given  by 


*  =  (M-l)  d+rMV  , 


where  is  the  proportional  increase  in  variance  due  to 
missingness : 


rM  -  (1+M  >  Bm  /  UM  . 


When  B.,  is  small  relative  to  11. ,  u  is  large  and  the  normal 
M  M 

approximation  to  the  t-distribution  suffices. 

For  k-dimensional  s,  such  as  the  vector  of  coefficients  in  a 

multiple  regression  analysis,  each  U.  ,  and  11,  are  covariance 
r  °  J  (m)  M 

matrices,  and  is  an  average  of  squares  and  cross  products 
rather  than  simply  an  average  of  squares.  Then  the  ouantity 

(S-sM)  V1  (S-sM)'  (7) 

is  approximately  F- dis tr ibuted  with  k  and  v  degrees  of  freedom, 
with  i/  defined  as  above  but  with  a  matrix  generalization  of  r^: 

rM  =  (1+M'1)  Trace (BMUM" L)/k  . 

By  the  same  reasoning  as  used  for  the  normal  approximation  for 
scalar  s,  a  chi-square  distribution  on  k  degrees  of  freedom  often 
suffices . 

Example  1:  A  Numerical  Illustration 

Consider  a  large  population  in  which  the  scalar  y  is 

distributed  N (/z ,  1 )  ,  and  it  is  desired  to  estimate  n  from  an  SRS  of 

size  12.  If  y  values  for  all  12  units  in  the  realized  sample  were 

observed,  inferences  about  n  would  be  based  on  y,  the  sample  mean, 

2 

and  U,  the  squared  standard  error  of  the  mean  (SEM  ).  U  is  1/N 
since  the  population  variance  is  known  to  be  one.  In  particular, 
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(y-^i)  -  N(0,l/12).  Suppose,  though,  that  the  values  for  the  last 
two  sampled  units  are  missing  (at  random) .  The  first  ten  are 
observed  to  take  the  values  shown  below: 

-.797,  -.176,  1.419,  .029,  -1.107, 

1.794,  -1.619,  1.104,  .418,  .161. 

These  observations  comprise  ^  ^  ,  while  is  (y^^y^-)- 

Using  multiple  imputations  to  estimate  the  population  mean  is 
accomplished  as  follows. 

Since  we  are  assuming  MAR  and  there  are  no  collateral  or 

design  variables,  the  distributions  of  and  y ^  are  simply 

N(/i,l)  too.  What  we  know  about  p  from  the  first  ten  observations 

2 

is  conveyed  by  their  mean  and  the  SEM  ;  with  an  indifference 

prior,  P (4 1 X0bs ^  • 123 , . 100) . 

A  completed  dataset  consists  of  the  ten  observed  y  values  and 
an  imputation  for  (y^.y^)-  For  sa^e  illustration,  five 

completed  datasets  will  be  constructed.  The  procedure  for  each 
consists  of  two  steps: 

a.  A  value  is  drawn  from  P  (M I  Z0t,s  ^  ’  °r  N(.123,.l). 

b.  Imputations  y, , .  .  and  y, . .  .  are  drawn  from  N(^.  .,1). 

11 (m)  12  (m)  (m) 

A  table  of  random  normal  deviates  gave  the  following  results: 
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m 

P(m) 

yll(m) 

y12(m) 

i 

.306 

-  .095 

-  .801 

2 

.238 

-  .599 

.644 

3 

.033 

-  .029 

.510 

4 

.089 

.  386 

-1.248 

5 

.001 

.205 

2.106 

From  each  completed  dataset,  an 

estimate  y.  .  is 
(m) 

calculated , 

the  mean  of  ten 

observed  y 

values  and 

two  imputations. 

The 

results  are  .028,  .106,  .143,  .031,  and  .295.  For  each  m, 

2 

U^m^=l/12,  an  SEM  appropriate  for  twelve  observations  from  a 
normal  population  with  a  variance  known  to  be  one. 

The  estimate  of  p  based  on  the  imputed  values  is  the 
average  of  the  five  completed-data  estimates,  or  .121.  Not 
surprisingly,  this  is  close  to  the  estimate  .123  based  on  . 
since  that  is  its  expected  value.  Indeed,  y^  would  converge  to 
.123  as  M  increased.  The  estimated  sampling  variance  of  y^  is 
obtained  by  (5)  as 

VM  -  S  V5  +  <U5'1)  E  <^<»)Vm)2/4 
=  1/12  +  1.2  x  .012 

-  .098. 

2 

This  value  is  very  close  to  1/10,  the  SEM  that  corresponds  to  a 
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sample  of  size  10- -as  it  should,  since  this  is  what  the  data 
actually  are. 

Randomization-based  inferences  about  n  using  the  multiple 
imputations  are  based  on 

(.121  -  n)  -  tlg4(0, .098)  , 

a  t-distribution  with  degrees  of  freedom  obtained  by  (6)  with  r^, 
the  proportional  increase  in  variance  due  to  the  missingness  of 
y^  and  y equal  to  .173. 

Latent  Variables  and  Sample  Surveys 
This  section  provides  notation  and  background  for  latent - 
variable  models,  and  shows  how  they  can  be  handled  in  complex 
samples  with  multiple  imputations. 


Latent-Variable  Models 


Latent  variables  in  educational  and  psychological  measurement 
models  account  for  regularities  in  observable  data;  for  example, 
examinees'  tendencies  to  give  correct  responses  to  test  items,  or 
respondents'  inclinations  toward  liberal  responses  on  social 
questions.  The  probability  that  a  subject  with  latent  parameter  8 


will  make  the  response  x^^  to  Item  j  is  modeled  as  a  function  of 
8  and  a  possibly  vector-valued  item  parameter  /L  ,  as  p(x^ ^  ^  |  0  ,/3j  )  . 
The  assumption  of  local  or  conditional  independence  posits  that 


the  latent  variable  accounts  for  associations  among  responses  to 
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various  items  in  a  specified  domain;  i.e.,  if  x  -  (x^j,...,x  ) 

is  a  vector  of  responses  to  n  items,  then 

n 

P(x|0,/3)  =  n  p(x(j)|0,^)  ,  (8) 

where  0  =  (0  , ... ,0  ) .  Moreover,  the  latent  variable  is  typically 
assumed  to  account  for  associations  between  response  variables  and 
collateral  subject  variables  such  as  demographic  or  educational 
standing.  Denoting  collateral  variables  by  y  and  z  to  anticipate 
developments  below,  the  extension  of  local  independence  posits 

P(x |  0  ,/J,y , z)  -  p(x|  0  ,/9)  .  (9) 

When  such  a  model  is  found  to  provide  an  adequate  fit  to 
data,  observing  the  responses  to  any  subset  of  items  induces  a 
likelihood  function  for  9  through  (8) ,  and  it  becomes  possible  to 
draw  inferences  about  individual  values  of  9  or  about  their 
distributions  in  populations  even  though  different  subjects 
respond  to  different  items.  This  capability  is  particularly 
attractive  in  educational  measurement,  as  when  an  IRT  model  can  be 
used  to  customize  tests  to  examinees  adaptively,  or  to  update  the 
item  pools  in  educational  assessments  over  time. 

If  the  focus  is  on  measuring  individuals  for  placement  or 
selection  decisions,  enough  items  can  be  administered  to  each  to 
make  the  likelihood  function  for  his  or  her  9  peak  sharply.  A 
satisfactorily  precise  point  estimate  of  each  $,  such  as  the  MLE  6 
or  the  Bayes  mean  estimate  6  -  E(0|x),  can  then  be  obtained.  If 
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the  focus  is  on  the  parameters  a  of  a  population  distribution  of 
$,  however,  these  locally  optimal  point  estimates  can  be  decidedly 
nonoptimal.  For  a  fixed  test  of  fixed  length,  their  distributions 
do  not  converge  to  the  true  distribution  of  6  as  examinee  samples 
increase.  It  becomes  necessary  to  estimate  a  directly,  bypassing 
the  intermediary  stage  of  calculating  point  estimates  for 
individuals . 

Suppose  the  data  matrix  x  consisted  of  response  vectors 

(x^ . xN)  of  an  SRS  of  respondents.  The  starting  point  for  the 

direct  estimation  of  the  parameters  a  of  the  density  function 
p(0|a)  in  this  context  would  be  the  marginal  probability: 

N 

p(x|a,/3)  -  n  /  p(x  |0,/9)  p(0|a)  d 6  .  (10) 

i-1 

A  number  of  recent  papers  in  the  statistical  and  psychometric 
literature  show  how  to  obtain  Bayesian  or  maximum  likelihood 
estimates  of  a  and/or  /?  using  (10)  (e.g.,  Andersen  and  Madsen, 
1977;  Bock  and  Aitkin,  1981;  Dempster,  Laird,  and  Rubin,  1977; 
Laird,  1978;  Mislevy,  1984,  1985;  Sanathanan  and  Blumenthal ,  1978; 
and  Rigdon  and  Tsutakawa,  1983).  As  they  are  presented,  however, 
these  methods  are  poorly  suited  to  general  analyses  of  survey  data 
involving  latent  variables.  As  well  as  being  limited  to  SRS  data, 
they  require  advanced  statistical  concepts  and  iterative  computing 
methodologies  that  render  them  inaccessible  to  the  typical 
secondary  analyst. 
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Latent  Variables  as  Missing  Responses 

A  key  insight  for  dealing  with  latent  variables  in  sample 
surveys  is  to  view  them  as  survey  var:  ibles  whose  responses  are 
missing  for  every  respondent  (e.g.,  Dempster,  Laird,  and  Rubin, 
1977,  on  factor  analysis  models).  Their  missingness  satisfies  MAR 
because  they  are  missing  regardless  of  their  values,  and  as  such 
are  amenable  to  the  (relatively)  simple  procedures  for  MAR  missing 
data  described  above.  In  essence,  knowledge  about  subjects' 
latent  variables  9  will  be  represented  in  the  form  of  predictive 
distributions  conditional  on  what  can  be  observed- -background 
survey  variables  y,  design  variables  z,  and  item  responses  x  whose 
distributions  are  assumed  to  be  governed  by  9. 

Suppose  the  object  of  inference  is  the  scalar  S  -  S(0,X,Y,Z), 
some  function  of  the  population  values  of  latent  variables,  item 
response  variables,  background  survey  variables,  or  design 
variables  of  all  units.  Suppose  further  that  a  sample  design  has 
been  specified.  Three  assumptions  are  central  to  drawing 
randomization-based  inferences  about  S.  The  first  is  that  one 
would  know  how  to  proceed  if  values  of  9  were  observed  rather  than 
latent : 

Assumption  1.  If  values  of  9  could  be  observed  from  sampled 
respondents,  along  with  values  of  x  and  y,  a  sample  statistic  s  = 
and  an  associated  variance  estimator  U  -  U(0,x,j£)  would 
be  available  for  randomization-based  inference  about  S,  via 


(s-S)  -  N(0 , U) . 
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Inferences  about  S  cannot  be  based  on  a  direct  calculation  of 
s  because  all  values  of  8  are  missing.  As  before,  however,  one 
can  base  inferences  on  its  conditional  expectation  given  what  is 
known- -the  sample  values  x  and  ^  and  the  population  values  Z--by 
adapting  (1)  as  follows: 

E(s|x,£)  =  f  s(8,x,^)  p(8lx,^,Z)  d 8  .  (11) 

The  second  and  third  assumptions  are  needed  to  define  the 
predictive  distribution  for  8  that  appears  in  (11),  namely 
p(e|x,^,Z).  These  assumptions  are  embodied  in  the  latent  variable 
model  and  a  superpopulation  structure  for  distributions  of  8  given 
background  survey  variables  and  design  variables. 

Assumption  2.  Item  responses  x  are  governed  by  the  latent 
variable  8  through  a  model  of  known  functional  form,  p(x|0,/3), 
characterized  by  possibly  unknown  parameters  /3  and  satisfying  the 
local  independence  properties  (8)  and  (9) .  Independence  Is 
assumed  over  subjects,  so  that 

P(x| 0,£,£,Z)  -  p(x|0,0) 

N 

-  n  P(x.|0.,£)  .  (12) 


Assumption  3.  The  distribution  of  latent  variables  given 
collateral  survey  variables  y  and  design  variables  z  follows  a 
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known  functional  form,  p(0|y,z,a),  characterized  by  possibly 
unknown  parameters  a.  Independence  is  assumed  over  subjects,  so 
that 

N 

p(£|y,Z,a)  =  n  p(^i|yi>zi,a)  .  (13) 

i 

It  is  important  to  note  that  these  distributions  are 
conditioned  on  the  design  variables  z  employed  in  the  sampling 
design.  While  the  sample  design  for  selecting  units  from  the 
existing  population  may  be  complex,  sampling  this  population  from 
the  hypothetical  superpopulation  is  SRS  given  z .  This  is  the 
essence  of  the  model-based  approach  to  sampling- survey  inference. 
It  is  used  in  this  presentation  not  to  handle  uncertainty  due  to 
sampling  examinees,  but  to  build  conditional  distributions  for 
latent  variables,  since  it  opens  the  door  to  the  aforementioned 
methods  of  estimating  the  parameters  of  latent-variable 
distributions  from  SRS  data. 

To  see  how  Assumptions  2  and  3  lead  to  p(0|x,y,Z),  first 
consider  some  relationships  conditional  on  a  and  /3 .  Using  Bayes 
theorem,  then  (12)  and  (13),  gives 

p(£|x,y,Z,a,/3)  =  Kq/?  P(x  |  £  ,£,  Z ,a , 0)  p(£|^,Z,o,/9) 


where  the  normalizing  constants  K 


ia/3  =  depend  on  a 

and  0  but  not  on  8  .  Given  a  and  0,  then,  the  predictive 
distribution  for  the  latent  variable  0^  of  Subject  i,  or 
p(8 i | Xi ,y  , z  ,a ,0) ,  would  be  obtained  by  normalizing  the  product 
of  (i)  the  likelihood  function  induced  for  8  by  x ^  via  the  latent- 
variable  model  (8),  and  (ii)  the  conditional  distribution  for  8 
implied  by  his  or  her  background  and  design  variables  y^  and  z^ . 

Multiple  Imputations  for  Latent  Variables 

The  preceding  paragraphs  give  the  framework  for 
randomization-based  inference  with  latent  variables  in  sample 
surveys.  To  operationalize  the  approach  with  multiple  imputations 
requires  specializing  the  procedure  outlined  above  as  follows: 

1.  Obtain  the  posterior  distribution  of  the  parameters  0  of  the 
latent- variable  model  and  a  of  the  conditional  distributions 
of  8,  namely  p(a ,0 | x,^, Z) ,  by  the  methods  discussed  in 
connection  with  (10) --for  example,  a  large  sample  normal 

A  A 

approximation  based  on  the  MLE  (a,/8)  and  asymptotic 
covariance  matrix  E 

a0 

Produce  M  "completed"  datasets  ( 8 .  ,x,£).  For  the  m^, 


2. 
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a .  Draw  a  value  (Q ’^)  (m)  fr°m  P(q>/9I^ ’Z’Z)  ■  (If  “  and  /3 
have  been  estimated  very  precisely,  it  may  be  acceptable 

A  A 

to  use  (a,P)  for  each  completed  dataset  in  what  is 
commonly  known  as  an  empirical  Bayes  approximation. 

This  expedient  introduces  a  tendency  to  underestimate 
the  uncertainty  associated  with  final  estimates  of  S.) 

b.  For  each  respondent,  draw  a  value  from  the  predictive 

distribution  p(0  |x^  ,y^ , ,  (<*,/?)-(<* ,/3)  )  .  Taken 

together,  the  resulting  imputation  for  8^  and  his  or  her 
observed  values  of  x. .  y. ,  and  z.  constitute  the 
"completed"  response  of  Subject  i. 

3.  Using  each  completed  dataset,  calculate  s.  .=s (8,  . ,x,y)  and 

(m)  - (m)  -  <■ 

V  -\H8  x,-£)  . 

(m)  ~(m)  - 

4.  The  final  estimate  of  S  is  the  average  of  the  M  estimates 

from  the  completed  datasets,  or  s,,  =  S  s,  .  /  M  . 

M  (m; 

5.  The  estimated  sampling  variance  of  s.,  as  an  estimate  of  S, 

M 

namely  V^,  is  the  sum  of  two  components: 

\-UH*  BM  • 

where  U„  and  B„  are  defined  as  for  (5),  to  quantify 

MM 

uncertainty  due  to  sampling  subjects  and  the  latency  of  8, 
respectively. 
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6.  If  inferences  about  S  would  have  been  based  on  (s-S)  -  N(0,U) 

had  all  responses  been  observed,  inferences  are  now  based  on 

(s.  -S)  -  t  (0,VW),  a  t-distribution  with  degrees  of  freedom  u 
M  1/  M 

given  in  (6) . 

These  procedures  apply  to  vector-valued  latent  variables  8  as 
well  as  scalars.  Extensions  to  vector-valued  S  are  as  discussed 
previously . 

Steps  1  and  2  above  produce  M  completed  datasets  that  can  be 
used  to  draw  inferences  about  any  number  of  sample  statistics  by 
applying  Steps  3-6  repeatedly.  An  attractive  feature  of  the 
approach  is  that  the  sophisticated  methodologies  and  heavy 
computation  are  isolated  in  Steps  1  and  2,  which  can  be  carried 
out  just  once- -probably  by  the  institution  held  responsible  for 
primary  data  analysis,  where  the  necessary  expertise  and  resources 
are  more  likely  to  be  available.  The  completed  datasets  are  then 
provided  to  secondary  researchers,  who  need  only  apply  standard 
routines  for  complete  data  M  times  and  combine  the  results  in 
simple  ways. 

Example  2:  Multiple  Imputation  under  Classical  Test  Theory 

This  example  lays  out  imputation  procedures  for  when  the 
latent-variable  model  is  the  classical  true-score  test  model  with 
normal  errors,  and  there  are  two  collateral  survey  variables.  In 
order  to  focus  on  the  construction  and  the  nature  of  imputations, 
it  is  assumed  that  a  large  SRS  will  be  drawn  from  a  multivariate 
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normal  population  with  known  parameters.  There  are  four  variables 
for  each  subject: 

9,  the  latent  variable,  is  examinee  "true  score;" 
x  is  examinee  observed  score;  and 

and  y ^  are  collateral  examinee  variables. 

Consistent  with  true-score  test  theory  with  normal  errors, 

assume  that  x  =  8  +  e,  where  the  residual  or  error  term  e  is 
2 

distributed  NCO.o^)  independently  of  9,  y  and  y^ .  The  latent 
variable  model  is  thus 


P(x|0)  -  N(tf.ffJ) 


(14) 


Assume  further  that  (8 ,y^,y^)  follows  a  standard  multivariate 
normal  distribution  in  the  population,  so  that  jointly, 
(x,fl,y1,y2)  -  MVN(0,Z)  with 


r  «  2. 

l+<7 

e 

(sym) 

’ 

s  - 

1 

1 

r01 

V8l 

1 

-  r02 

r82 

r12 

1  . 

Z 

~y 


will  refer  to  the  covariance  matrix  for  y=(y^,y2)- 

The  conditional  distribution  of  9  given  y  is  obtained  as 


P(0  ly)  ~  N</3 ' y , <7^  , y> 


(15) 


where 
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P'y  =  ^1|2  yx  +  | i  y2  “  E(ely)-  and 

=  1  -  /S'  2  0  -  1  -  R2  , 

B\y  ~y 

2 

where  R  is  the  proportion  of  variance  of  6  accounted  for  by  y. 

2 

In  this  example,  ^  2=('X  8l'X  82X1'1)  /  )  and  ^2  1 1” 

(ri92'ri?lr12)/,(1'rl22)  '  It;  follows  that  P(xly)  ~  NO'y^^y),  with 

2  2  2 

o  ,  =  cr„  +  cr  . 

x|y  fl|y  e 

In  addition  to  the  multiple  regression  coefficients  there  are 
simple  regression  coefficients  for  9  or  x  on  y^  or  y 2  alone;  e.g., 

E(0  lyp)  =  ECxIyj^)  -  01  y1  -  <01(2  +  r12  /32jl)  yx  . 

In  this  example,  $^=r^  and  ^2~r#2' 

Define  the  "conditional  reliability"  p ^  of  x  as  a  measure  of 
0  given  y  as 


2  2  2 

=  tr*.  /  {a..  +  a) 

0\y  9 |y  e 


2  2 

Note  that  0 <p  <1  and  pa,  -  o„ ,  . 

c  c  x|y  5 |y 

An  imputation  for  a  sampled  subject  will  be  drawn  from  a 

predictive  distribution  of  the  form  p(0|x,yx  which  is 

proportional  to  p(x| 6)  p(0|y).  The  first  factor  in  this  product 

2 

is  the  likelihood,  which  in  this  example  is  N(x,oe);  the  MLE  is  in 

fact  simply  x.  The  second  factor  is  the  conditional  density  of  8 

2 

given  y,  which  is  N(/5’y ,a  ).  By  Kelley's  (1947)  formula, 

o  I  y 


Randomization-Based  Inference 


23 


p(0|x,y)  - 


where 


8  -  E(0|x,y)  -  pcx  +  (l-pc)  p'y 

and 

a2..  -  Var(0  |x,y)  -  (1-p  )  a2..  -  (1-p  )(1-R2)  . 

8  | xy  1  J  c  8 |y  c 

An  imputation  8  -  0(x,y)  is  thus  constructed  as 
8  -  6  +  f  , 

.  2 
where  f  is  drawn  at  random  from  N(0.a..  ). 

-  5|xy 

For  a  given  individual,  an  imputation  is  not  an  optimal  point 
estimate  of  8.  It  is  neither  unbiased  nor  efficient,  as  is  the 

A 

MLE  0-x;  nor  does  it  minimize  mean  square  error  over  the 
population,  as  does  8.  But  it  can  be  shown  that  the  distribution 
of  (8  ,y)  is  multivariate  normal  with  the  same  mean  vector  and 
covariance  matrix  as  that  of  (8, y).  For  the  population  mean,  for 

£ (8)  -  E[pcx  +  (l-pc)  p'y  +f] 

-  PcE(x)  +  (l-pc)  p'  E(y)  +  E(f) 

-  p  0  +  (1-p  )  P'  0  +0 

c  c  - 


example , 
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Var(fl) 

-  1 

-  Var(0)  , 

E(0 |y) 

-  P'y 

-  E((?|y)  , 

Var(0|y) 

2 

°8\y 

-  Var(tf|y)  , 

E(^|y1) 

-  ^  yx 

-  E(^|y1)  ,  and 

Var(^|yi) 

-  1  -  4 

-  Var<*|yi)  . 

Given  that  the  mean  of  6  is  an  unbiased  estimate  of  the  mean 
of  8  ,  how  much  does  uncertainty  increase  when  x  and  y  are  observed 
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as  percentile  points,  conditional  distributions,  and  proportions- 

A 

of -variance-accounted-for .  In  contrast,  treating  either  9s  or  9s 
as  9s  one  obtains  estimates  for  some  attributes  that  have  the 


correct  expectations  for  some  attributes,  but  not  for  others.  For 
estimating  the  mean  of  9,  all  three  turn  out  to  have  the  correct 
expectation  of  zero.  For  the  variance,  the  expectation  using 
imputations  is  the  correct  value  of  one,  but  the  expected 

a  -  2  2 

variances  of  9  and  9  are  1  +cj  and  l-<7„,  respectively. 

e  9  | xy  r  J 

Table  1  gives  the  expectations  of  estimates  of  some 
population  attributes  that  result  when  various  point  estimates  of 
6  are  treated  as  if  they  were  9.  Imputations  9  appear  there,  as 

A 

well  as  the  MLE  9  (-x) .  The  Bayes  estimate  referred  to  above  as  9 

is  denoted  9  to  indicate  that  it  is  conditional  on  both  x  and  y; 

xy  - 

another  Bayes  estimate  often  used  in  practice,  denoted  in  the 
table  as  B  ,  ignores  y.  The  variance  of  0^  is  known  to  be  less 
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than  that  of  9,  so  it  is  sometimes  suggested  that  6  values  be 

inflated  by  the  factor  needed  to  bring  their  variance  up  to  that 

-  ( r ) 

of  8  .  The  resulting  rescaled  Bayes  estimate  is  denoted  by  8 
While  this  rescaling  corrects  the  variance  of  the  population  as  a 
whole,  it  does  not  completely  remove  the  biases  associated  with 
attributes  of  conditional  distributions  involving  y. 

Table  1  about  here 


Note  that  the  distortions  in  secondary  analyses  of  all  these 
point  estimates  depend  on  test  reliability.  Reliability,  and 
therefore  the  magnitudes  of  biases,  will  vary  if  test  lengths 
differ  over  time  or  across  respondent  groups.  Tables  2  and  3 
illustrate  this  point  by  giving  numerical  values  for  the 
expressions  in  Table  1  that  are  obtained  from  a  test  with  p^,=.50 
and  a  test  with  p  -.91,  in  both  cases  with  rfl1  -  r.„  -  r, „  -  .50. 
The  first  test  has  a  reliability  that  might  be  expected  with  ten 
items  on  a  particular  topic  that  appear  on  an  educational 
assessment  instrument.  The  second  has  a  reliability  more  like 
that  of  a  60- item  achievement  test.  The  biases  that  occur  when 


using  any  of  the  "optimal"  point  estimates  instead  of  the 
imputations,  are  readily  apparent  for  the  short  test  but  are  less 
serious  for  the  long  test. 
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Tables  2  and  3  here 


Along  with  moments  of  population  and  subpopulation 
distributions,  cumulative  probabilities  are  sometimes  required- - 
e.g.,  P(0>1)  or  P(0>1 |y^=-l) .  Statistics  of  this  type  are 
important  in  NAEP,  where  selected  points  along  the  proficiency  are 
anchored  by  behavioral  descriptions,  and  the  proportions  of 
populations  and  subpopulations  above  these  points  are  tracked  over 
time  (NAEP,  1985).  P(0>1),  in  this  example,  is  the  proportion  of 

the  total  population  above  one  standard  deviation  in  the  standard 
normal  distribution,  or  .1587.  P(0>1 |y^«-l)  is  the  proportion  of 

the  subpopulation  defined  by  y^«-l  with  9 -values  higher  than  one 
standard  deviation  above  the  total  population  mean,  which,  in  this 
example,  is  .0418,  the  proportion  of  N(-.5,,/.75)  above  1.  Table  4 
gives  the  expectations  of  these  values  that  obtain  when  point 
estimates  of  9  are  treated  as  9.  Even  though  the  regression 

A.  — 

estimates  for  B .  from  9  and  9  are  unbiased,  the  estimated 

- - y.y -  - -  - 

population  proportions  above  the  cut  point  are  distorted.  Again 


the  distortion  is  less  serious  with  the  long  test. 
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Computing  Approximations  and  Secondary  Biases 

Obtaining  consistent  estimates  of  population  attributes  with 
multiple  imputations  requires  drawing  imputations  from  consistent 
estimates  of  the  correct  predictive  distributions  p(0|x,y). 
Assuming  the  latent  variable  model  p(x|0)  is  specified  correctly, 
it  is  possible  to  obtain  detailed  nonparametric  approximations  of 
p(0|y),  and  therefore  of  p(0|x,y),  when  the  dimensionalities  of  6 
and  y  are  low- -say  less  than  five  latent  variables  and  five 
collateral  variables  (Mislevy,  1984).  When  the  dimensionalities 
of  6  and  y  are  high,  however,  as  in  NAEP  with  its  hundreds  of 
background  and  attitude  items,  simplifications  and  computing 
approximations  cannot  be  avoided.  This  section  lays  out  a  general 
framework  for  the  problems  entailed  by  using  simplified 
approximations  of  p(0|y),  derives  some  explicit  results  for  the 
true-score  example  introduced  above,  and  offers  guidelines  for 
practical  applications. 

The  Nature  of  the  Problem 

The  imputation-based  estimate  s^(x,£)  approximates  the 
expectation  of  s(0,x,£)  defined  in  (11)  by  evaluating  s  with  draws 
6  from 

pCjMx.y)  « 

SS  p(x |  5  ,/?)  p(0|y,z,a)  p(a,/9|x,£,z)  da  d/9 


(16) 
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Rubin  raises  the  possibility  of  biases  in  secondary  analyses 
when  he  mentions  that  the  imputer's  model  may  differ  from  the 
analyst's  (Rubin,  1987,  pp .  107,  109-112,  151).  His  analyses 
suggest  that  biases  for  population  means  and  variances  may  be  mild 
when  filling  in  a  modest  number  of  missing  responses  in  standard 
surveys,  particularly  if  the  imputer's  model  is  more  inclusive 
than  the  analyst's.  It  will  become  apparent  that  this  conclusion 
need  not  hold  in  the  context  of  latent  variables  models,  however, 
especially  when  practical  constraints  force  the  iinputer  to  use  a 
less  inclusive  model  than  the  analyst. 


Example  2  (continued) 

Consider  again  the  multinormal  example  introduced  above,  with 

2 

its  true-score  latent  variable  model  x|0  -  N (6 ,o^)  and  population 

2 

model  6  |y  -  N(/9'y  ,a  ).  Suppose  now  that  the  imputer  conditions 

D\y 

on  y^  but  not  y ^  when  building  the  model  from  which  imputations 
are  drawn.  That  is,  the  correct  predictive  distribution  for 
imputations  is 


p(0|x,y)  =  N[pcx  +  (l-pc)£'y,  o^|xy]  , 
but  the  imputer  draws  from 

p*(S|x,y)  -  N(p*x  +  a] |xyl] 


J 
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where  p ^  is  the  conditional  reliability  of  x  given  but  not  y^, 


"»|yl  /  <»J|yl  +  %> 


(1'rn>  '  <'‘rn  +  %>  : 


’fl  | xyL  -  V*c('lx’>’1) 


*  2  *  2 
(1-V  a  8  |yl  -  <1^c>(1-ril) 


An  imputation  for  Subject  i  now  takes  the  form 


'i  -  ^cxi  +  a-pc^iyn  +  6 


where  g  is  a  draw  from  N(0,o.,  , 

0|xyl 

How  do  the  attributes  of  the  distribution  of  imputed  values 

k 

8  fare  as  estimates  of  the  attributes  of  the  distribution  of  61 
By  calculations  similar  to  those  in  the  first  part  of  the  example, 

k 

it  can  be  shown  that  ( 8  ,y)  is  normal  with  mean  vector  0,  as  is 

(0,y),  and  its  covariance  matrix  agrees  with  that  of  (8  ,y)  for  all 

* 

elements  except  the  one  for  8  with  y2 .  Rather  than  r^,  one 
obtains 


Cov(*  ,y2)  ^  r$2  -  pc  r^2  +  (1-^)  r^  ^ 


It  follows  first  that  characteristics  of  the  joint 

k 

distribution  of  8  and  y^  are  identical  to  those  of  6  and  y^ .  For 


ms  tance  , 


E 


-  0  E (8) 
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Var (0*)  -  1  -  Var(tf)  . 
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This  corresponds  to  the  case  in  which  the  imputer's  model  is  more 
inclusive  than  the  analyst's,  and,  as  Rubin  suggests,  the  result 
is  satisfactory.  Also,  corresponding  to  the  case  in  which  the 

imputer  and  the  analyst  use  the  same  model,  one  obtains 

E(/|yi)  -  filYl  =  E(tf|yi) 

and 

Var(**|yi)  =  l-^jxyl  -  Var(«|yi)  . 

The  secondary  an -  thus  obtains  the  correct  expectations  for 
both  marginal  analyses  of  8  and  conditional  analyses  involving 
only  8  and  the  collateral  variable  that  was  conditioned  on. 

Less  salubrious  are  results  for  analyses  involving  y^,  the 
collateral  variable  that  was  not  conditioned  on.  Whereas 

E(0|y)  -  E(8 |y)  -  j  2yl  +  ^2 | ly2  ’ 
one  finds  that 

E(My)  -  [01]2  +  a-P*c)02]1r12)yi  +  P/2|ly2 

-  ^i|2yl  +  ^2|l[pcy2  +  (1’^*)E(y2lyl)1  (20a) 

”  ^1 1 2yl  +  /92|ly2  '  (1"pc)^2|l(y2‘rl2yl)  ’  (20b) 

A  bias  is  thus  introduced  into  the  imputations,  the  nature  of 
which  is  to  attenuate  the  contribution  from  y^,  the  omitted 
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variable.  Equation  (20a)  shows  that  the  contribution  associated 
with  y 2  is  a  weighted  average  of  (i)  the  correct  contribution'¬ 
ll!^'^0  t*le  deSree  that  x  is  a  reliable  indicator  of  0,  and 
(ii)  to  the  degree  that  x  is  unreliable,  a  contribution  associated 
with  the  expectation  of  the  omitted  variable  given  the  observed 
value  of  the  conditioned  variable.  Equation  (20b)  shows  that  the 
bias  can  be  driven  to  zero  in  three  ways: 

k 

a.  As  pc-l;  i.e.,  x  is  a  perfectly  reliable  measure  of  0; 

b.  As  i-e-*  there  is  no  contribution  from  y^  anyway; 

c.  As  ri2yl~>y2  f°r  a ii  yL;  i  e.,  y ^  is  perfectly  predictable 
from  y^ . 

Some  consequences  for  regression  analyses  involving  y ^  are  now 
considered . 


Whereas  the  regression  coefficient  for  y ^  in  the  multiple 

regression  of  0  on  y  is  the  corresponding  coefficient  for  y ^ 

At 

in  the  multiple  regression  of  9  on  y  is 


^2 | 1  Pc^2| 1 


(21) 


The  expected  regression  coefficient  has  been  shrunken  by  the 
k 

factor  (1 ' P Q)  i  the  complement  of  test  reliability  given  y^. 


Whereas  the  regression  coefficient  for  y^  in  the  multiple 
regression  of  0  on  y  is  obtained  as 
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^112  <'r5l‘r52r12)/(1'r12  )  ’ 


the  corresponding  coefficient  in  the  multiple  regression  of  8  on 
y  is 


^1 1 2  "  (r^l're2r12)/(1'r12  )  ‘ 


(22) 


The  bias  can  be  expressed  as 


^1 | 2  '  ^1 | 2 


(1'pc}  ^2 | 1  r12 


Thus,  in  the  analysis  of  conditional  9  distributions  given  both  y^ 
and  y^ ,  bias  exists  for  the  coefficient  of  y^  even  though  it  has 
been  conditioned  on.  Since  r^y-^  -  E^Y2^1^’  the  character  of  the 
bias  is  to  absorb  a  portion  of  the  unique  contribution  of  the 
nonconditioned  variable,  to  the  extent  that  x  is  unreliable. 


Whereas  the  coefficient  f}^  for  the  simple  regression  of  9  on 

* 

^2  i‘S  V 92’  corresPondinS  coefficient  for  9  on  y^  is 

h  -  r*92  -  'l  r 92  +  r81  r12  •  (23) 

* 

The  bias  with  which  is  estimated  from  9  can  be  expressed  as 

^2  '  ^2  (1'pc)  (1'rl2  )  ^2 | 1  ‘ 

*  2 

This  bias  is  reduced  as  either  p  the  test  reliability,  or  r^. 
the  propor t 5 on  of  y ^  predictable  by  y^ ,  approaches  one,  or  as  the 
unique  contribution  of  y^  in  predicting  9  approaches  zero. 
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To  summarize,  the  degree  of  biases  in  secondary  analyses 

involving  variables  not  conditioned  on  involves  (i)  the 

reliability  of  x,  (ii)  the  association  between  the  conditioned  and 

the  nonconditioned  collateral  variables,  and  (iii)  the  unique 

contribution  of  the  nonconditioned  variable  to  predicting  6. 
k 

Higher  values  of  p  are  unequivocally  helpful,  as  they  reduce 

2 

biases  of  all  types.  Higher  values  of  r^.  on  the  other  hand, 
mitigate  bias  in  simple  regression  involving  only  the 
nonconditioned  variable,  but  exacerbate  the  bias  for  the 
coefficient  of  the  conditioned  variable  in  multiple  regression 
involving  both.  These  conclusions  extend  in  natural  ways  to  sets 
of  conditioned  and  nonconditioned  variables  (Beaton  and  Johnson, 
1987;  Mislevy  and  Sheehan,  1987). 

Table  5  gives  values  for  expected  regression  coefficients  in 

k 

analyses  of  6  and  6  ,  using  the  numerical  values  employed  in 
Tables  2  and  3.  For  simple  regression,  the  results  for  the 
conditioned  variable  are  unbiased,  and  the  results  for  the 
nonconditioned  variable  are  comparable  in  accuracy  to  those 
obtainable  from  "optimal"  point  estimates  (bearing  in  mind  the 
fact  that  MLEs  yield  unbiased  regression  coefficients  but  biased 
conditional  variances  and  percentile  points) .  It  can  also  be  seen 
that  the  results  of  multiple  regression  are  more  sensitive  to 
omitting  variables  than  those  of  simple  regression.  These 
findings  underscore  the  importance  of  choosing  conditioning 
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variables  wisely  if  practical  considerations  preclude  conditioning 
on  all  the  background  variables  that  have  been  surveyed. 


Insert  Table  5  here 


Implications  and  Recommendations 

The  foregoing  results  indicate  that  care  is  required  to 
impute  values  for  latent  variables  in  a  way  that  leads  to 
acceptably  accurate  results  in  secondary  analyses.  Precise 
determination  of  8  by  item  responses  x  alleviates  the  problem  of 
biases,  but  to  do  so  in  the  context  of  educational  or 
psychological  measurement  requires  large  numbers  of  item  responses 
from  every  subject- -a  design  that  is  inefficient  for  drawing 
inferences  about  only  population  attributes.  When  testing  time 
for  individuals  is  limited,  the  imputer  must  build  a  computing 
approximation  p  (0|y,z)  that  gives  good  results  for  a  broad  range 
of  potential  statistics  s  involving  8  . 

If  there  are  only  a  small  number  of  values  that  y  and  z  can 
take,  and  a  large  number  of  subjects  at  each  combination  of 
values,  one  can  obtain  a  nonparametric  estimate  for  each  (y,z) 
combination  by  the  methods  of  Laird  (1978)  or  Mislevy  (1984). 

This  leads  to  imputations  that  are  free  from  specification  error 
in  p(0|y,z),  and  secondary  analyses  will  not  suffer  from  biases 
from  this  source.  This  approach  is  simply  not  possible,  though, 
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for  surveys  such  as  NAEP  with  large  numbers  of  background  and 
attitude  items. 

As  mentioned  above,  one  possibility  is  to  assume  normal 
conditional  distributions  with  structured  means  and  a  common 
residual  variance  (Mislevy,  1985).  In  addition  to  assuming  this 
tractable  distributional  form,  further  simplification  becomes 
necessary  when  there  are  large  numbers  of  design  and  background 
variables.  In  ANOVA  terms,  conditioning  on  the  full  joint 
distribution  of  (y,z)  could  involve  millions  of  effects,  while 
currently  available  computing  procedures  can  handle  up  to  about  a 

'k 

hundred.  Specifying  p  wisely  means  choosing  contrasts  so  as  to 
optimize  the  accuracy  of  potential  secondary  analyses.  Based  on 
the  results  of  Example  2,  the  following  advice  can  be  offered: 

Determine  6  as  well  as  is  practical.  In  the  context  of 

measuring  latent  proficiencies  by  test  items,  recall  the 

decreasing  rate  at  which  reliability  increases  - -and  potential 

biases  in  secondary  analyses  thereby  decrease- -with  additional 

test  items.  Trade-offs  arise  among  potential  item-sampling 

designs.  Compared  to  a  design  that  gives  five  items  to  each 

subject,  a  design  that  gives  ten  items  yields  less  efficient 

c 

estimates  of  statistics  involving  variables  y  that  have  been 

conditioned  on,  but  less  biased  estimates  of  those  involving 
nc 

variables  y  that  have  not  been  conditioned  on. 
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Borrow  information  from  related  scales.  As  noted  earlier, 
imputation  methods  apply  to  vector -valued  latent  variables.  In 
such  appl ications ,  one  estimates  multivariate  conditional 
distributions  p(0|y,z),  combines  them  with  multidimensional 
likelihoods  p(x|0),  and  draws  vector-valued  imputations  from  joint 
predictive  distributions  p(0|x,y,z).  This  was  done  in  the  NAEP 
survey  of  Young  Adult  Literacy  (Kirsch  and  Jungeblut,  1986),  where 
each  respondent  was  presented  five  to  fifteen  items  in  each  of 
four  IRT  literacy  scales.  The  population  correlations  of  about  .6 
among  scales  sharpened  the  predictive  distribution  of  each  scale 
for  an  individual:  while  the  information  available  directly  about 
the  scale  was  worth  ten  items,  the  thirty  items  from  the  other 
scales  indirectly  contributed  information  worth  about  another  ten. 
The  biases  in  secondary  analyses  were  thus  reduced  as  much  by 
using  multivariate  imputations  for  the  four  scales  jointly  as  they 
would  have  been  under  separate  univariate  imputations  with  twice 
as  many  items. 

Condition  explicitly  on  contrasts  that  are  particularly 

important .  such  as  treatment  group  in  a  survey  designed  expressly 

to  compare  treatment  effects  in  a  program  evaluation.  By  doing 

so,  one  ensures  that  the  marginal  subpopulation  means  or 

regression  coefficients  involving  key  variables  are  estimated  as 

c 

accurately  as  possible.  Note  that  y  can  include  interactions  as 


well  as  main  effects. 
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Condition  on  well-chosen  combinations  of  variables.  Given 
current  computational  limits,  it  will  often  be  impossible  to 
condition  on  the  main  effects  of  all  background  variables,  let 
alone  two-way  or  higher  interactions  among  them.  One  can  reduce 
biases  for  a  large  number  of  contrasts  of  interest,  beyond  those 
that  can  be  conditioned  on  explicitly,  by  conditioning  on  linear 
combinations  of  contrasts- -for  example,  the  first  h  component 
scores  from  a  principal  components  decomposition  of  the  covariance 
matrix  among  effects.  The  results  of  Example  2  imply  that  the 
variation  these  partially-conditioned-upon  variables  share  with 
the  explicitly-conditioned-upon  component  scores  will  have 
salutary  effects  in  secondary  analyses.  The  degree  of  bias  for  an 
effect  will  be  limited  to  the  proportion  of  its  variance 
unaccounted  for  by  the  conditioned-upon  components,  times  the 
complement  of  conditional  test  reliability. 

Example  3 :  The  1984  NAEP  Reading  Assessment 

During  the  1983-84  school  year,  the  National  Assessment  of 
Educational  Progress  (NAEP)  surveyed  the  reading  and  writing 
skills  of  national  probability  samples  of  students  at  ages  9,  13, 
and  17,  and  in  the  modal  grades  associated  with  those  ages,  namely 
4,  8,  and  11.  Beaton  (1987)  gives  details  of  assessment 
procedures  and  analyses.  This  section  of  the  present  paper 
highlights  the  multiple  -  imputations  procedures  used  in  the 
analysis  of  the  reading  data. 
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The  Student- Sampling  Design 

A  multistage  probability  sampling  design  was  employed  to 
select  students  into  the  NAEP  sample,  with  counties  or  groups  of 
counties  as  the  primary  sampling  units  (PSUs) .  Schools  served  as 
second-stage  sampling  units.  The  assignment  of  testing  sessions 
of  different  types  to  sampled  schools  was  the  third  stage  of 
sampling,  and  the  selection  of  students  within  schools  was  the 
fourth.  A  total  of  64  PSUs  appeared  in  the  sample,  and 
assessments  were  administered  at  1,465  schools.  About  20,000 
students  were  assessed  in  reading  at  each  grade/age  cohort.  For 
convenience,  grade/age  cohorts  will  be  referred  to  below  simply  by 
their  age  designations. 

Sampling  was  stratified  at  the  first  stage  according  to 
geographic  regions,  Census  Bureau  "Sample  Description  of 
Community"  (SDOC)  classes,  and,  within  urban  and  rural  SDOC 
classes,  a  measure  of  SES ;  the  latter  two  criteria  comprise  Size 
and  Type  of  Community  (STOC)  classes.  Selection  probabilities  of 
sampling  units  were  inversely  proportional  to  estimated  population 
size,  except  that  extreme  rural  and  low-SES  urban  areas  were 
oversampled  by  a  factor  of  two.  Neglecting  minor  adjustments  for 
nonresponse  and  poststratification,  the  design  variables  Z  were 
therefore  region,  STOC,  PSU,  and  school  membership. 

Population  means  and  totals  of  survey  variables  were  computed 
as  weighted  sample  means  and  totals,  with  a  student's  weight 
essentially  inversely  proportional  to  his  or  her  probability  of 
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selection.  Uncertainty  of  such  a  statistic  s  due  to  student 
sampling  was  approximated  with  a  multiweight  jackknife  procedure. 
Thirty-two  pairs  of  similar  PSUs  were  designated.  Approximating 
the  uncertainty  of  s  required  computing  it  33  times:  once  in  a  run 
with  the  total  sample,  and  once  with  a  run  corresponding  to  each 
PSU  pair,  with  one  of  its  members  left  out  of  the  sample  but  with 
the  sampling  weight  of  its  partner  doubled.  The  variance  of  the 
32  jackknife  estimates  around  the  total  value  is  U,  the  estimated 
sampling  variance  of  s  around  S. 

The  Survey  Variables 

Each  student  responded  to  a  number  of  survey  items  (Y) 
tapping  demographic  status,  educational  background  and  reading 
practices,  and  attitudes  about  reading  and  writing.  About  50 
were  common  to  all  assessment  forms.  Examples  are  gender, 
parents'  education,  ethnicity,  and  time  spent  watching  television. 
Another  300,  of  which  a  given  student  would  receive  between  about 
10  and  30  under  the  assessment's  balanced  incomplete  block  (BIB) 
item-sampling  design,  addressed  reading  activities  in  the  home  and 
school . 

A  total  of  340  multiple-choice  and  free-response  reading 
exercises  were  used  in  the  assessment,  although  a  student  who 
received  any  reading  exercises  received  between  5  and  50  of  them 
under  the  BIB  design.  About  80  percent  of  the  students  received 
some  reading  exercises.  A  few  of  the  exercises  appeared  at  all 
three  ages,  but  most  appeared  in  only  one  or  in  two  adjacent  ages. 
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The  Latent-Variable  Model 

A  priori  considerations  and  extensive  dimensionality  analyses 
supported  summarizing  responses  to  a  subset  of  228  of  the  340 
items  by  an  IRT  model  for  a  single  underlying  proficiency  variable 
8  (Zwick,  1987).  Responses  to  these  items  will  be  denoted  by  x. 
The  3-parameter  logistic  (3PL)  IRT  model  was  used.  Under  the  3PL, 
the  probability  of  a  correct  response  to  Item  j  from  a  student 
with  proficiency  8  is  given  by 


PCx^-110  ,a^  ,bj  ,Cj)  -  Cj  +  ( 1  -  Cj  )/( 1+exp  [  - 1 . 7a^  (0  -b^  )  ]  )  , 


where  x=*l  indicates  a  correct  response  and  x-0  an  incorrect  one, 
while  (aj.bj.Cj)  are  parameters  that  characterize  the  regression 
of  x^  on  Coupled  with  this  model  for  a  single  item,  the 

assumptions  of  local  independence  embodied  by  (8)  and  (9)  give  the 
likelihood  function  p(x|0,£)  [*p(x| 8 ,/?,y,z) ]  for  a  response  vector 
x  to  any  subset  of  the  228  items.  Here  fi  denotes  the  vector  of 
item  parameters  of  all  228  items  in  the  scale. 

Students  receiving  any  of  the  228  reading  items  in  the  scale 
at  all  received  between  5  and  40  of  them,  with  the  average  about 
17.  The  responses  of  a  sample  of  10,000  students  were  used  with  a 
modified  version  of  Mislevy  and  Bock's  (1983)  BILOG  computer 
program  to  estimate  3PL  item  parameters.  The  modifications 
allowed  the  c-parameters  of  free-response  items  to  be  set  to  zero 
a  priori,  and  distinguished  age  cohorts  when  computing  marginal 
probabilities  of  students'  response  patterns.  The  latter 


extension  is  necessary  to  achieve  consistent  item  parameter 
estimates  since  items  were  assigned  to  cohorts  based  on  prior 
knowledge  about  the  proficiencies  of  the  cohorts- -younger  students 
were  generally  administered  easier  items- -but  that  consistency 
holds  whether  or  not  sampling  weights  within  cohorts  are  used 
(Mislevy  and  Sheehan,  in  press) .  The  resulting  item  parameter 
estimates  were  placed  on  a  scale  in  which  the  unweighted 
calibration  sample  of  students  was  standardized.  This  scale  is 
arbitrary  up  to  a  linear  transformation,  so  any  other  convention 
would  have  served  equally  well  to  set  the  scale.  The  resulting 
estimates  were  taken  as  fixed  throughout  the  remainder  of  the 
study,  thereby  fixing  the  origin  and  unit-size  of  the  6  scale. 

The  Imputation  Model 

The  ideal  population  model  p(0|y,z)  to  use  in  conjunction 
with  p(9|x,d)  is  a  joint  distribution  involving  hundreds  of  survey 
variables  and  background  and  attitude  items.  The  computer  program 
available  to  estimate  a  latent  population  distribution  was  a 
prototype  developed  for  Mislevy' s  (1985)  4-group  example,  with 
ANOVA-type  structures  on  the  means  and  normal  residuals  with  a 
common  within-group  variance.  In  the  time  available  to  meet  NAEP 
reporting  deadlines,  it  was  possible  to  extend  the  program  to  a 
design  with  17  effects.  A  main  effects  model  was  chosen  that 
focused  accuracy  on  traditional  NAEP  reporting  categories:  sex, 
ethnicity,  STOC,  region,  and  parental  education,  along  with 
indicators  for  at,  above,  or  below  modal  grade  and  age.  A 
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"miscellaneous"  cell  was  included  in  the  model  for  the  small 
fraction  of  students  whose  values  on  the  aforementioned  items  were 
unrecoverable.  Altogether,  these  variables  comprised  the  vector 

Q 

(y.z)  .  The  following  model  was  assumed  for  conditional 
distributions : 

8\y,z  -  N[ (y,z)C,r,o2]  , 

2 

where  T  is  a  vector  of  seventeen  main  effect  parameters  and  a  is 

2 

the  residual  variance.  Together,  T  and  a  comprise  the 
superpopulation  parameter  a  referred  to  in  preceding  sections. 

Note  that  the  design  variables  Z  are  captured  largely  but 
not  completely,  as  STOC  and  region  have  been  included  but  PSU, 
school  membership,  and  interaction  terms  have  not  been.  Because 
PSU  and  school -level  variation  are  largely  explained  by  the 
conditioned-upon  region  and  SES  indicators  STOC  and  parental 
education,  however,  biases  caused  the  omission  of  PSU  and  school 

Q 

membership  in  (y,z)  are  largely  mitigated. 

2 

Maximum  likelihood  estimates  of  T  and  a  were  obtained 
separately  within  ages  using  all  available  reading  data- -over 
20,000  respondents  per  age.  Separate  age  analyses  implicitly 
allow  for  all  two-way  interactions  between  cohort  and  each  of  the 
other  effects.  The  results  are  shown  in  Table  6.  Like  item 
parameter  estimates,  these  estimates  were  also  taken  as  known  true 


values  thereafter. 


Randomization-Based  Inference 

45 


Table  6  about  here 


The  posterior  distribution  of  each  student  i  was  approximated 

by  a  histogram  over  40  equally  spaced  points  9  between  -4.785  and 

th  th 

+4.785.  The  weight  of  the  q  bar  in  the  histogram  for  the  i 

student  was  obtained  as  follows: 


P(9  | x . ,y? , z?) 

q  i  J  i  i 


P(x.|f)=eq,£=/3)  P(tf-eq|yJ,z \,<x=a) 

A  A 

2  ?(xi\6=Qs,p=0)  P(0=es|yJ,zJ,a=Q) 


(24) 


s 

A 

c  c 

where  P(0=0s | y^ , z^ ,a=a)  is  the  density  at  8s  of  the  normal  pdf 

A 

Q 

with  mean  (y,z)  T  and  the  residual  variance  for  the  age  group  to 
which  Student  i  belongs.  Each  imputation  was  drawn  from  this 
distribution  in  two  steps.  First,  a  bar  was  selected  at  random  in 
accordance  with  the  weights  determined  in  (24) .  Second,  a  point 
was  selected  at  random  from  a  uniform  density  over  the  9  range 
spanned  by  the  selected  bar.  (Logistic  interpolation  would  have 
been  better.)  Five  imputations  were  drawn  in  this  manner  for  each 
student  in  the  sample. 


Illustrative  Results 

The  results  of  primary  interest  in  this  first  analysis  of  the 
1984  data  were  the  mean  proficiencies  of  the  subpopulations 
determined  by  the  traditional  NAEP  reporting  categories.  A  given 


weighted  mean  was  calculated  five  times,  once  from  each  completed 
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data  set.  The  average  of  the  five  was  the  reported  estimate.  A 
jackknife  variance  was  also  calculated  from  each  completed 
dataset,  with  the  imputations  in  the  set  treated  as  known  true 
values  of  6  ;  these  values  were  also  averaged,  to  estimate  U.  The 
reported  variance  was  the  sum  of  this  average  jackknife  variance 
and  1+M  ^=1.2  times  the  variance  of  the  five  aforementioned  means. 
In  order  to  avoid  negative  numbers  and  fractional  values,  the 
original  9  scale  was  linearly  transformed  by  500  +  250.5.  Table  7 
illustrates  some  results  for  Age  17.  The  full  public  report  is 
available  as  The  Reading  Report  Card:  Progress  toward  Excellence 
in  our  Schools  (NAEP ,  1985). 


Table  7  about  here 


The  proportional  increase  in  variance  due  to  the  latency  of 
9,  denoted  earlier  as  r^,  varies  from  2  percent  up  to  nearly  30 
percent.  The  largest  increase  is  associated  with  a  lower- than- 
average  scoring  group,  for  whom  the  test  items  were  relatively 
more  difficult.  The  proficiencies  of  low-scoring  individuals  were 
determined  less  precisely  by  their  item  responses,  so  that  the 
likelihoods  induced  for  6,  and  the  consequent  posterior 
distributions  p(0|x,y,z)  from  which  their  imputations  were  drawn, 
were  more  dispersed.  Estimated  means  from  such  subpopulations 
tended  to  vary  more  widely  across  completed  datasets  than  would 
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means  for  groups  of  similar  size  whose  typical  members  were 
measured  more  accurately  by  their  item  responses. 


Biases  in  Secondary  Analyses 

The  completed  datasets  described  above  were  constructed  so  as 

to  focus  their  accuracy  on  the  marginal  subpopulation  means 

featured  in  The  Reading  Report  Card.  As  discussed  in  a  previous 

section,  however,  analyses  involving  survey  variables  that  were 

not  conditioned  on  are  subject  to  bias.  An  opportunity  soon  arose 

to  examine  such  biases.  A  report  on  reading  proficiency  levels 

among  pupils  whose  primary  language  was  not  English  came  due,  the 

analysis  plan  for  which  specified  multiple  regression  analyses 

involving  both  variables  that  had  and  variables  that  had  not  been 

conditioned  on  when  imputations  for  the  original  analysis  were 

constructed.  Aware  that  the  proposed  analyses  were  sensitive  to 

failure  to  condition,  the  NAEP  staff  created  new  completed 

datasets  in  which  all  the  variables  required  in  the  analyses  were 

conditioned  upon.  This  was  made  possible  by  Sheehan's  (1985)  M- 

2 

GROUP  computer  program  for  estimating  T  and  a  in  larger  models. 
Some  fifty  effects  were  included  in  the  recalculations  for  each 
age.  The  same  multiple  regression  analyses  were  carried  out 
twice,  once  with  the  original  completed  datasets  and  once  with  new 
ones  created  with  extended  conditioning  vectors.  The  results  of 
one  such  comparison  are  summarized  in  Table  8.  Baratz- Snowden  and 
Duran  (1987)  give  the  final  results  of  all  runs. 
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Table  8  about  here 


It  may  be  seen  that  multiple  regression  coefficients  for  the 
two  effects  which  were  conditioned  upon  in  both  analyses  were 
least  affected  in  the  recalculation,  differing  by  amounts  only  4 
and  9  percent  of  their  new  estimated  values.  Coefficients  for 
significant  effects  not  originally  conditioned  on,  however, 
differed  by  between  10  and  40  percent.  The  direction  of  the 
difference,  in  nearly  all  of  these  cases,  was  that  the  original 
estimates  were  shrunken  toward  zero.  Performing  the  analysis  on 
the  original  completed  datasets  would  correctly  inform  the 
researcher  about  the  directions  of  effects,  but  would  tend  to 
underestimate  their  magnitudes  by  an  average  of  30-percent. 

Extensions 

The  experience  with  multiple  imputation  procedures  gained 
with  the  1984  reading  assessment  led  to  a  number  of  insights  on 
how  to  extend  or  improve  the  procedures.  Four  are  mentioned 
below. 

Multivariable  Imputation.  The  preceding  discussions  have 
concentrated  on  the  case  of  a  single  latent  variable.  While  this 
proved  adequate  for  summarizing  reading  data,  both  empirical  and 
theoretical  evidence  demonstrate  the  need  for  multiple  scales  in 
broader  content  areas  such  as  mathematics  and  science.  NAEP 
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extended  multiple  imputations  procedures  to  the  case  of  four 
variables  in  the  Young  Adult  Literacy  Assessment  (Kirst b  ai.d 
Jungeblut,  1986),  and  later  applied  these  procedures  to  five 
subscales  each  to  analyze  the  1986  mathematics  and  science 
assessments  (Beaton,  1988)  . 

In  the  multivariable  case,  each  latent  variable- -a  different 
aspect  of,  say,  literacy  skill- -is  defined  through  an  IRT  model. 
Assuming  conditional  independence,  the  four-dimensional  likelihood 

p(x^ . | 8 ^,...,8^)  is  simply  the  product  of  the  four 

univariate  IRT  likelihoods,  or  II  pCx^lfl^).  The  conditional 
distributions  p(0^ , . . . , 8  |y, z)  are  generally  not  independent, 
however,  their  associations  reflecting  population  correlations 
among  skills.  The  predictive  distributions 

p( 8^ . 9^ | x^  ,  . . . ,x^ ,y , z)  from  which  imputations  are  drawn 

reflect  these  associations.  Compared  to  carrying  out  imputation 
procedures  separately  within  each  scale,  the  multivariable 
solution  exploits  information  from  all  scales  to  strengthen 
inferences  about  each,  and  yields  consistent,  rather  than 
attenuated,  estimates  of  association  among  the  scales. 

Conditioning  on  Principal  Components.  An  aspect  of  multiple 
imputation  procedures  that  requires  improvement  is  the  accuracy  of 
multiple  regression  analyses  that  include  nonconditioned 
background  variables.  Conditioning  on  more  background  variables 
with  Sheehan's  (1985)  improved  M-GROUP  program  certainly  increases 
the  number  of  secondary  analyses  whose  accuracy  will  be  improved. 
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but  research  is  currently  under  way  to  determine  how  to  choose 
them  wisely.  As  mentioned  above,  conditioning  on  well-chosen 
linear  combinations  of  large  numbers  of  variables  holds  promise. 
Analyses  of  the  1988  data  will  examine  gains  in  accuracy 
attainable  with  different  combinations  of  numbers  of  effects 
conditioned  upon  explicitly  and  effects  conditioned  on  partially, 
though  principal  components. 

Accounting  for  Uncertainty  in  a  and  8.  Analyses  to  date  have 
taken  estimates  of  item  parameters  and  conditional  distribution 
parameters  a  as  known,  thereby  neglecting  the  component  of 
uncertainty  associated  with  them.  Present  indications  are  that 
the  resulting  overstatement  of  precision  is  negligible  because  of 
the  huge  sample  sizes  from  which  these  effects  are  estimated,  but 
research  is  under  way  to  develop  efficient  methods  for 
incorporating  uncertainty  about  them  as  well  as  about  6  .  One 
approach  to  doing  so  is  leans  on  asymptotic  results,  drawing  from 
multivariate  normal  (a,/3)  distributions  with  means  given  by  MLEs 
and  variances  given  by  inverses  of  information  matrices.  An 
alternative  approach  is  to  draw  from  multivariate  distributions 
whose  mean  and  variance  matrix  were  obtained  by  a  jackknife 
procedure.  The  latter  is  more  intensive  computationally,  but 
captures  variation  due  to  lack  of  model  fit  as  well  as  due  to 


sampling . 
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The  Average  Response  Method  (ARM) .  To  analyze  the  data  from 
the  1984  NAEP  survey  of  writing,  Beaton  and  Johnson  (1987)  worked 
out  multiple  imputation  procedures  for  the  setting  of  general 
linear  models.  They  address  the  problem  of  characterizing  the 
distribution  of  the  average  of  ratings  over  all  writing  exercises- 
-a  straightforward  problem  when  every  examinee  is  presented  all 
exercises,  but  effectively  a  latent  variable  problem  under  an 
item-sampling  design  in  which  each  examinee  takes  only  a  few 
exercises  from  the  pool.  Computational  procedures  are  simpler 
under  the  ARM  than  with  IRT  models  because  the  assumed  linearity 
of  relationships  permits  noniterative  unweighted  least  squares 
solutions.  Expressions  for  estimation,  imputation,  and 
expectation  in  secondary  analyses  offer  insight  into  the  problem 
for  those  familiar  with  the  theory  of  general  linear  models. 

Conclusion 

At  the  beginning  of  the  decade,  Bock,  Mislevy,  and  Woodson 
(1982)  hailed  item  response  theory  as  a  cornerstone  of  progress 
for  educational  assessment.  Assuming  that  one  can  manage  the 
challenges  of  control  and  consistency  that  arise  in  any  study  that 
extends  over  time,  IRT  does  indeed  make  it  possible  to  solve  many 
practical  problems  in  assessment,  such  as  allowing  item  pools  to 
evolve  over  time,  providing  results  on  a  consistent  scale  in  the 
face  of  complex  i tern- sampl ing  designs,  and  reducing  the  numbers  of 
items  students  are  presented. 


Possible,  but  not  necessarily  easy.  Familiar  IRT  procedures, 
based  on  obtaining  point  estimates  for  individual  examinees,  break 
down  in  efficient  assessments  that  solicit  relatively  few 
responses  from  each  student.  This  paper  and  others  on  IRT  in 
assessment  (e.g.,  Mislevy  and  Bock,  1988)  make  it  clear  that 
higher  levels  of  theoretical  and  computational  complexity  are 
required  to  realize  the  benefits  IRT  offers. 

This  paper  argues  that  Rubin's  (1987)  multiple  imputation 
procedures  provide  a  suitable  theoretical  framework  for  latent 
variables  in  sample  surveys,  and  illustrates  how  the  procedures 
can  be  applied.  This  method  has  the  advantage  of  placing  the 
burden  of  the  problem  on  the  primary  analyst,  who  must  create 
completed  datasets.  With  them,  the  secondary  analysts  can  carry 
out  their  research  using  standard  routines  for  complete  data. 
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Table  2 


Numerical  Values 

for  a  Short 

Assessment 

Instrument 

Dependent  Variable 

Population 
Attribute  8 

and  8 

A 

8 

8 

xy 

6 

X 

9  (r) 

X 

Mean 

.000 

.000 

.000 

.000 

.000 

Variance  1 

.000 

2.000 

.600 

.500 

1.000 

S imple 
Regression 

Coefficient 

.  500 

.500 

.500 

.250 

.354 

Residual 

Variance 

.750 

1.750 

.350 

.438 

.875 

%  variance 
accounted  for 

.250 

.125 

.417 

.125 

.125 

Table  3 

Numerical  Values  for 

a  Long  Test 

Dependent  Variable 

Population 
Attribute  8 

and  8 

A 

8 

8 

xy 

B 

X 

9  (r) 

X 

Mean 

.000 

.000 

.000 

.000 

.000 

Variance  1 

.000 

1.100 

.940 

.910 

1.000 

S imple 
Regression 

Coefficient 

.500 

.  500 

.500 

.455 

.477 

Res  idual 
Variance 

.  750 

.850 

.  690 

.703 

.773 

%  variance 
accounted  for 

.250 

.227 

.266 

.227 

.227 
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Table  4 

Estimated  Proportions  Above  Cut  Point 


Population 

Attribute 

9  and  9 

Dependent  Variable 

A 

9  6 

xy 

0 

X 

1 

I  u 

1  X 

1 

Short  Test 

P(0>1) 

.1587 

.2611 

.0985 

.0778 

.1587 

P<*>l|y]  — 1) 

.0418 

.1292 

.0055 

.0294 

.0735 

Long  Test 

P(0>1) 

.1587 

.1711 

.1515 

.1469 

.1587 

P(fl>l|yi— 1) 

.0418 

.0516 

.0351 

.0409 

.0465 

Table  5 

Expected  Regression  Coefficients  for  a  Short  and  a  Long  Test, 
with  Complete  and  Incomplete^  Conditioning  for  Imputations 


Population 

Attribute  6 

and  9 

Dependent  Variable 

8*  (pt-.50)  9* 

<PX-.91) 

Simple  regression 

.500 

.500 

.500 

^2 

.500 

.357 

.471 

Multiple  regression 

^1 1  2 

.333 

.429 

.353 

^2  1 1 

.  333 

.143 

.294 

1 


Imputations  constructed  by  conditioning  on  but  not  . 
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Table  6 

Estimates  of  Conditional  Distribution  Parameters 


Grade  4/ 

Grade  8/ 

Grade  11/ 

Effect 

Level 

Age  9 

Age  13 

Age  17 

Intercept 

All  subjects 

-1.351 

-  .433 

.159 

Gender 

Male' 

.000 

.000 

.000 

Female 

.096 

.139 

.160 

Ethnicity 

k 

Black 

.000 

.000 

.000 

White  &  other 

.460 

.403 

.405 

Hispanic 

.076 

.113 

.135 

STOC 

k 

Rural  or  Low  metro 

.000 

.000 

.000 

High  metro 

.490 

.308 

.230 

Other 

.245 

.122 

.148 

Region 

k 

Northeast 

.000 

.000 

.000 

Central 

-  .133 

-  .042 

.028 

Southeast 

-  .009 

-  .021 

.023 

West 

-  .087 

-  .042 

.005 

Parent  Ed. 

k 

Less  than  HS 

.000 

.000 

.000 

High  school  grad 

.209 

.140 

.082 

Beyond  HS 

.395 

.404 

.379 

Don't  know/missing 

.120 

-  .017 

-  .075 

** 

k 

.000 

Grade/Age 

<  M  age ,  =  M  grade 

.000 

.000 

=  M  age ,  <  M  grade 

-  .672 

-  .433 

-  .617 

=  M  age ,  =  M  grade 

-  .065 

-  .013 

-  .084 

=  M  age ,  >  M  grade 

.338 

.549 

.077 

>  M  age ,  =  M  grade 

-.307 

-  .260 

-  .533 

Misc . 

Subjects  with 
unrecoverable 
missing  values 

.510 

-.329 

.810 

Residual  variance 

.464 

.386 

.457 

Sample  size 

22,950 

23,553 

23,932 

Effect  fixed  at  zero 

"M"  denotes  "modal";  e.g.,  ">  M  age,  -  M  grade"  means  "above 
the  modal  age  and  at  the  modal  grade  in  one's  age/grade 
cohort . ” 


Randomization- Based  Inference 


k 


Table  7 

Estimating  Age  17  Means  from  Completed  Datasets 


Total 

Males 

Black 

West 

(1) 

Imputation  1 

288.005 

282.644 

266.195 

287.177 

(2) 

Imputation  2 

288.258 

283.201 

265.104 

288.338 

(3) 

Imputation  3 

288.208 

282.869 

265.259 

288.018 

(4) 

Imputation  4 

288.135 

282.554 

264.832 

287.745 

(5) 

Imputation  5 

287.819 

282.314 

264.241 

287.196 

(6) 

Average  (l)-(5) 

288.085 

282.718 

265.126 

287.695 

(7) 

Variance  (l)-(5) 

.025 

.092 

.406 

.208 

(8) 

Average  jackknife 

variance 

1.248 

1.225 

1.742 

4.333 

(9) 

Total  variance 

1.2  x  (7)  +  (8) 

1.278 

1.335 

2.229 

4.583 

(10) 

Proportional  increase 

[ (9) - (8) )/(8) 

.024 

.090 

.280 

.058 

I 


I 


i 


Rural 


282.990 

283.499 

283.285 

282.854 

282.360 

282.997 

.152 


9.218 


9.400 


.020 
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Table  8 

Multiple  Regression  Estimates  based  on  Imputations 
Constructed  with  Partial  and  Full  Conditioning 


Partial  Full  Conditioning  % -attenuation 


Effect 

0 

P 

SE(/3) 

t 

all  significant 

White;  language 
minority 

6.08 

4.23 

3.96 

1.07 

-43.74 

White;  language 
non-minority 

12.22 

13.72 

2.94 

4.67 

10.93 

10.93 

Hispanic;  lang. 
non-minority 

-  .  76 

1.22 

3.25 

.38 

162.30 

Asian;  language 
minority 

-2.90 

-6.25 

4.39 

-1.42 

53.60 

Asian;  language 
non-minority 

9.54 

17.34 

4.66 

3.72 

44.98 

44.98 

Black;  language 
non-minority 

-8.64 

-10.82 

2.95 

-3.67 

20.15 

20.15 

* 

Sex  —  male 

-8.55 

-9.35 

.80 

-11.69 

8.56 

8.56 

k 

Parent  education 

6.03 

5.80 

.  38 

15.26 

-3.97 

-3.97 

Home  language 
minority 

-9.41 

-13.78 

2.78 

-4.96 

31.71 

31.71 

Study  aids 

2.63 

3.89 

.43 

9.05 

32.39 

32.39 

Homework 

2  .  78 

3.82 

.30 

12.73 

27.23 

27.23 

Hours  of  TV 

-1.22 

-2.04 

.24 

-8.50 

40.20 

40.20 

Pages  read 

6.36 

10.59 

1.01 

10.49 

39.94 

39.94 

Years  academic 

courses 

.91 

1.25 

.14 

8.93 

27.20 

27.20 

* 


Effect  included  in  partial  conditioning  set 
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Commanding  Officer, 

Naval  Research  Laboratory 
Code  2627 

Washington,  DC  20390 

Dr.  Harold  F.  O'Neil,  Jr. 

School  of  Education  -  WPH  801 
Department  of  Educational 
Psychology  &  Technology 
University  of  Southern  California 
Los  Angeles,  CA  90089-0031 
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Or .  James  B .  Olsen 
WICAT  Systems 
1875  South  State  Street 
Orem,  UT  84058 

Office  of  Naval  Research, 

Code  1142CS 
800  N.  Quincy  Street 
Arl ington,  VA  22217-5000 
(6  Cop i es) 

Office  of  Naval  Research, 

Code  125 

800  N.  Quincy  Street 
Arlington,  VA  22217-5000 

Assistant  for  MPT  Research, 
Development  and  Studies 
OP  01B7 

Washington,  DC  20370 

Dr .  Judith  Orasanu 
Basic  Research  Office 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Jesse  Orlansky 
Institute  for  Defense  Analyses 
1801  N.  Beauregard  St. 
Alexandria,  VA  22311 

Dr.  Randolph  Park 
Army  Research  Institute 
5001  Eisenhower  Blvd. 

Alexandria,  VA  22333 

Wayne  M.  Patience 
American  Council  on  Education 
GED  Testing  Service,  Suite  20 
One  Dupont  Circle,  NW 
Washington,  DC  20036 

Dr.  James  Paulson 
Department  of  Psychology 
Portland  State  University 
P.0.  Box  751 
Portland,  OR  97207 

Dept,  of  Administrative  Sciences 
Code  54 

Naval  Postgraduate  School 
Monterey,  CA  93943-5026 


Department  of  Operations  Research, 
Naval  Postgraduate  School 
Monterey,  CA  93940 

Dr.  Mark  D.  Reckase 
ACT 

P.  0.  Box  168 
Iowa  City,  IA  52243 

Dr .  Malcolm  Ree 
AFHRL/MOA 

Brooks  AFB,  TX  78235 

Dr.  Barry  Riegelhaupt 
HumRRO 

1100  South  Washington  Street 
Alexandria,  VA  22314 

Dr.  Carl  Ross 
CNET-PDCD 
Building  90 

Great  Lakes  NTC,  IL  60088 
Dr.  J.  Ryan 

Department  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 

Dr.  Fumiko  SameJima 
Department  of  Psychology 
University  of  Tennessee 
310B  Austin  Peay  Bldg. 

Knoxvi lie,  TN  37916-0900 

Mr.  Drew  Sands 

NPRDC  Code  62 

San  Diego,  CA  92152-6800 

Lowe  I  I  Schoer 

Psychological  &  Quantitative 
F  oundat i ons 
Co  I  I ege  of  Educat i on 
University  of  Iowa 
Iowa  City,  IA  52242 

Dr.  Mary  Schratz 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92152-6800 


Dr .  Dan  Segal  I 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92152 
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Dr.  W.  Steve  Se I  I  man 
0A$D  vMRAUL  ) 

2B269  The  Pentagon 
Washington,  DC  20301 

Dr.  Kazuo  Shigemasu 
7-9-24  Kugenuma-Ka i gan 
Fu j i sawa  251 
JAPAN 

Dr .  William  Sims 
Center  for  Naval  Analysis 
4401  Ford  Avenue 
P.0.  Box  16268 
Alexandria,  VA  22302-0268 

Dr .  H .  Wallace  S i na i ko 
Manpower  Research 

and  Advisory  Services 
Smithsonian  Institution 
801  North  Pitt  Street,  Suite  120 
Alexandria,  VA  22314-1713 

Dr.  Richard  E.  Snow 
School  of  Education 
Stanford  University 
Stanford,  CA  94305 

Dr.  Richard  C.  Sorensen 
Navy  Personnel  R&D  Center 
San  Diego,  CA  92152-6800 

Dr.  Paul  Speckman 
University  of  Missouri 
Department  of  Statistics 
Columbia,  MO  65201 

Dr.  Judy  Spray 
ACT 

P.0.  Box  168 
I owa  City,  I A  52243 

Dr.  Martha  Stocking 
Educational  Testing  Service 
Princeton,  NJ  08541 

Dr .  William  Stout 
University  of  Illinois 
Department  of  Statistics 
101  I  I  I  ini  Hall 
725  South  Wright  St. 

Champaign,  IL  61820 


Dr.  Hariharan  Swaminathan 
Laboratory  of  Psychometric  and 
Evaluation  Research 
School  of  Education 
University  of  Massachusetts 
Amherst,  MA  01003 

Mr.  Brad  Sympson 

Navy  Personnel  R&D  Center 

Code-62 

San  Diego,  CA  92152-6800 

Dr.  John  Tangney 
AFOSR/NL ,  Bldg.  410 
Bolling  AFB,  DC  20332-6448 

Dr.  Kikumi  Tatsuoka 
CERL 

252  Engineering  Research 
Laboratory 

103  S.  Mathews  Avenue 
Urbana,  IL  61801 

Dr.  Maurice  Tatsuoka 
220  Education  Bldg 
1310  S.  Sixth  St. 

Champaign,  IL  61820 

Dr .  Dav id  Th i ssen 
Department  of  Psychology 
University  of  Kansas 
Lawrence,  KS  66044 

Mr.  Gary  Thomasson 
University  of  Illinois 
Educational  Psychology 
Champaign,  IL  61820 

Dr.  Robert  Tsutakawa 
University  of  Missouri 
Department  of  Statistics 
222  Math.  Sciences  Bldg. 
Columbia,  MO  65211 

Dr.  Ledyard  Tucker 
University  of  Illinois 
Department  of  Psychology 
603  E.  Daniel  Street 
Champaign,  IL  61820 
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Or.  Vern  W.  Urry 
Personnel  R&D  Center 
Office  of  Personnel  Management 
1900  E.  Street,  NW 
Washington,  DC  20415 

Dr .  Dav id  Vale 
Assessment  Systems  Corp. 

2233  University  Avenue 
Su i te  440 

St.  Paul ,  MN  55114 

Or .  Prank  L .  V i c i no 
Navy  Personnel  R&D  Center 
San  Diego,  CA  92152-6800 

Dr.  Howard  Wainer 
Educational  Testing  Service 
Princeton,  NJ  08541 

Dr.  Ming-Mei  Wang 
L i ndqu i st  Center 
for  Measurement 
Un i vers i ty  of  Iowa 
I owa  City,  IA  52242 

Dr.  Thomas  A.  Warm 
Coast  Guard  Institute 
P .  0 .  Subs  at i on  18 
Oklahoma  City,  OK  73169 

Dr.  Brian  Waters 
HumRRO 

12908  Argy . e  Circle 
Alexandria,  VA  22314 

Dr .  Dav id  ; .  Weiss 
N660  Elliott  Hall 
University  of  Minnesota 
75  E .  Rive  Road 
Minneapoli  ,,  MN  55455-0344 

Dr.  Ronald  A.  We i tzman 
Box  146 

Carme I ,  CA  93921 

Major  John  Welsh 
AFHRL/MOAN 

Brooks  AFB,  TX  78223 


Dr.  Doug  las  Wetzel 
Code  51 

Navy  Personnel  R&D  Center 
San  Diego,  CA  92152-6800 

Dr.  Rand  R.  Wi I  cox 
University  of  Southern 
Cal i f orn i a 

Department  of  Psychology 
Los  Angeles,  CA  90089-1061 

German  Military  Representative 
ATTN:  Wolfgang  Wildgrube 
Stre i tk  raef teamt 
D-5300  Bonn  2 

4000  Brandywine  Street.  NW 
Washington,  DC  20016 

Dr .  Bruce  Willi ams 
Department  of  Educational 
Psycho  I ogy 

University  of  Illinois 
Urbana,  IL  61801 

Dr.  Hilda  Wing 
NRC  MH-176 

2101  Constitution  Ave. 
Washington,  DC  20418 

Dr.  Martin  F.  Wiskoff 
Defense  Manpower  Data  Center 
550  Camino  El  Estero 
Suite  200 

Monterey,  CA  93943-3231 

Mr.  John  H.  Wolfe 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92152-6800 

Dr.  George  Wong 
Biostatistics  Laboratory 
Memorial  S I oan-Ketter  i  ng 
Cancer  Center 
1275  York  Avenue 
New  York,  NY  10021 

Dr.  Wallace  Wulfeck,  III 
Navy  Personnel  R&D  Center 
Code  51 

San  Diego,  CA  92152-6800 
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Dr.  Kentaro  Yamamoto 
03-T 

Educational  Testing  Service 
Rosedale  Road 
Princeton,  NJ  08541 

Dr.  Wendy  Yen 
CTB/McGraw  Hill 
Del  Monte  Research  Park 
Monterey,  CA  93940 

Dr.  Joseph  L.  Young 
National  Science  Foundation 
Room  320 

1800  G  Street,  N.W. 
Washington,  DC  20550 

Mr.  Anthony  R.  Zara 
National  Council  of  State 
Boards  of  Nursing,  Inc. 
625  North  Michigan  Avenue 
Su i te  1544 
Chicago,  IL  60611 

Dr.  Peter  Sto I  of f 
Center  for  Naval  Analysis 
4401  Ford  Avenue 
P.0.  Box  16268 
Alexandria,  VA  22302-0268 


